Picture for Kaiwen Zheng

Kaiwen Zheng

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues

Add code
Jan 28, 2026
Viaarxiv icon

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Add code
Jan 01, 2026
Viaarxiv icon

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Add code
Dec 19, 2025
Figure 1 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 2 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 3 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 4 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Viaarxiv icon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Add code
Dec 18, 2025
Viaarxiv icon

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Add code
Sep 19, 2025
Viaarxiv icon

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

Add code
Aug 10, 2025
Viaarxiv icon

Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

Add code
May 23, 2025
Viaarxiv icon

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

Add code
Apr 14, 2025
Figure 1 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 2 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 3 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 4 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Viaarxiv icon

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

Add code
Apr 14, 2025
Viaarxiv icon

Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator

Add code
Mar 03, 2025
Viaarxiv icon